Using delta code to produce an incremental update to the CCDD data files for QA. Starting from CCDD release of January 2018, and updating based on DPD extract of Feb 2, 2018 and changes to Ingredient Stem, Unit of Presentation and Combination Products files.
No changes to coded_attribute, or device-ntp.
## Warning in postgresqlExecStatement(conn, statement, ...): RS-DBI driver
## warning: (unrecognized PostgreSQL field type uuid (id:2950) in column 12)
## Warning in postgresqlExecStatement(conn, statement, ...): RS-DBI driver
## warning: (unrecognized PostgreSQL field type uuid (id:2950) in column 0)
The January 2018 CCDD used the following source files:
The February 2018 CCDD generation will use the following source files:
The ingredient stem file has the following changes:
Note that the February file had a number of blank lines that had to be removed before the comparison.
## Warning: Missing values are always removed in SQL.
## Use `min(x, na.rm = TRUE)` to silence this warning
The Combination products file has changes to the NTP_TYPE column only. Note that the February file had differences in Drug Identification Number (no leading zeros), and additional blank rows that complicated this comparison.
The following changes are detected between DPD extracts from January 3, 2018 and February 2, 2018:
library(compareDF)
library(dplyr)
library(dtplyr)
library(dbplyr)
library(data.table)
library(DBI)
drug_20180103 <- tbl(dpd, in_schema("dpd_20180103", "drug_product"))
dpd_product_changes <- dbGetQuery(dpd$con, "SELECT drug_code, case
when s.drug_code is null then 'New drug_code in dpd_20180202'
when t.drug_code is null then 'Drug_code missing in dpd_20180202'
else 'Data Changed'
end as status
from dpd_20180103.drug_product s
full outer join dpd_20180202.drug_product t using (drug_code)
where s is distinct from t") %>% left_join(drug %>% select(drug_code, drug_identification_number), copy = TRUE)
dpd_changes_jan <- drug_20180103 %>% filter(drug_code %in% dpd_product_changes$drug_code) %>% collect()
dpd_changes_feb <- drug %>% filter(drug_code %in% dpd_product_changes$drug_code) %>% collect()
dpd_change_detail <- compare_df(dpd_changes_feb, dpd_changes_jan, "drug_code")
dpd_change_columns <- dpd_change_detail$comparison_df %>%
select(drug_code) %>%
bind_cols(dpd_change_detail$comparison_table_diff) %>%
filter(chng_type == "+") %>%
select(-drug_code1, -chng_type)
datatable(dpd_change_columns)
We need to recalculate any product that is touched by changes in Ingredient Stem or Unit of Presentation. We also need to recalculate any product that was previously in the CCDD but changed in DPD. Need to add in marketed products with date of marketing after CCDD start date. Lastly, we need to add new MPs (in this case there are none because they mostly went to approved status, and the 3 that went right to active are sunscreens).
## Joining, by = "drug_code"
## Joining, by = "drug_identification_number"
## Joining, by = c("drug_identification_number", "drug_code")
## Joining, by = c("extract", "drug_code")
## Joining, by = "ingredient"
## Joining, by = "drug_identification_number"
## Warning: Missing values are always removed in SQL.
## Use `min(x, na.rm = TRUE)` to silence this warning
Now we generate TMs and NTPs and MPs, but just for the subset we identified above.
ntp_dosage_form_map <- collect(ccdd_ntp_dosage_forms)
## Warning in postgresqlExecStatement(conn, statement, ...): RS-DBI driver
## warning: (unrecognized PostgreSQL field type uuid (id:2950) in column 12)
# For each combo of pharmaceutical form and route of administration,
# create some basic summary statistics
# Bryce : Need to capture this intermediate and send to Julie
# The filter restricts the combinations to only those products
# with ingredients flagged in the ingredient stem table
#This is an intermdediate (QA) file
dpd_ccdd_form_route_combinations <- dpd_delta %>%
left_join(ing, copy = TRUE) %>%
select(extract, drug_code, dpd_ingredient = ingredient) %>%
left_join(form, copy = TRUE) %>%
left_join(route, copy = TRUE) %>%
collect() %>%
left_join(ing_stem_feb) %>%
filter(ccdd == "Y") %>%
distinct(drug_code, pharmaceutical_form, route_of_administration) %>%
arrange(pharmaceutical_form, route_of_administration) %>%
group_by(drug_code) %>%
summarize(pharmaceutical_form = paste(unique(pharmaceutical_form), collapse = ", "),
route_of_administration = paste(unique(route_of_administration), collapse = ", ")) %>%
distinct(pharmaceutical_form, route_of_administration)
## Joining, by = "drug_code"
## Joining, by = c("extract", "drug_code")
## Joining, by = c("extract", "drug_code")
## Joining, by = "dpd_ingredient"
# Rows from the dpd_ccdd combos that are not in the ntp_dosage_form_route_map
# This file should be empty
# This is a QA file
missing_form_routes <- anti_join(dpd_ccdd_form_route_combinations, ntp_dosage_form_map)
## Joining, by = c("pharmaceutical_form", "route_of_administration")
mp_with_missing_form_routes <- dpd_delta %>%
distinct(drug_code) %>%
left_join(route, copy = TRUE) %>%
left_join(form, copy = TRUE) %>%
distinct(drug_code, pharmaceutical_form, route_of_administration) %>%
semi_join(missing_form_routes) %>%
group_by(pharmaceutical_form, route_of_administration) %>%
summarize(n_dins = n_distinct(drug_code),
drug_codes = paste(drug_code, collapse = ", "))
## Joining, by = "drug_code"
## Joining, by = c("drug_code", "extract")
## Joining, by = c("route_of_administration", "pharmaceutical_form")
datatable(mp_with_missing_form_routes)
## Warning: Missing values are always removed in SQL.
## Use `min(x, na.rm = TRUE)` to silence this warning
Now we should have everything we need to generate deltas against last months extract
Apply changes to previous month’s dataset
## Warning in instance$preRenderHook(instance): It seems your data is too
## big for client-side DataTables. You may consider server-side processing:
## http://rstudio.github.io/DT/server.html
## Warning in instance$preRenderHook(instance): It seems your data is too
## big for client-side DataTables. You may consider server-side processing:
## http://rstudio.github.io/DT/server.html